Asynchronous Optimistic Rollback Recovery Using Secure Distributed Time
نویسندگان
چکیده
In an asynchronous distributed computation, processes may fail and restart from saved state. A protocol for optimistic rollback recovery must recover the system when other processes may depend on lost states at failed processes. Previous work has used forms of partial order clocks to track potential causality. Our research addresses two crucial shortcomings: the rollback problem also involves tracking a second level of partial order time (potential knowledge of failures and rollbacks), and protocols based on partial order clocks are open to inherent security and privacy risks. We have developed a distributed time framework that provides the tools for multiple levels of time abstraction, and for identifying and solving the corresponding security and privacy risks. This paper applies our framework to the rollback problem. We derive a new optimistic rollback recovery protocol that provides completely asynchronous recovery (thus directly supporting concurrent recovery and tolerating network partitions) and that enables processes to take full advantage of their maximum potential knowledge of orphans (thus reducing the worst case bound on asynchronous recovery after a single failure from exponential to at most one rollback per process). By explicitly tracking and utilizing both levels of partial order time, our protocol substantially improves on previous work in optimistic recovery. Our work also provides a foundation for incorporating security and privacy in optimistic rollback recovery.
منابع مشابه
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
Consider the problem of transparently recovering an asynchronous distributed computation when one or more processes fail. Basing rollback recovery on optimistic message loggingand replay is desirable for several reasons, including not requiring synchronization between processes during failure-free operation. However, previous optimistic rollback recovery protocols either have required synchroni...
متن کاملMinimizing Timestamp Size for Completely Asynchronous Optimistic Recovery with Minimal Rollback - Reliable Distributed Systems, 1996. Proceedings., 15th Symposium on
Basing rollback recovery on optimistic message logging and replay avoids the need for synchronization between processes during failure-free execution. Some previous research has also attempted to reduce the need for synchronization during recovery, but these protocols have suffered from three problems: not eliminating all synchronization during recovery, not minimizing rollback, or providing th...
متن کاملMinimizing Timestamp Size for Completely Asynchronous Optimistic Recovery with Minimal Rollback
Basing rollback recovery on optimistic message logging and replay avoids the need for synchronization between processes during failure-free execution. Some previous research has also attempted to reduce the need for synchronization during recovery, but these protocols have suffered from three problems: not eliminating all synchronization during recovery, not minimizing rollback, or providing th...
متن کاملEfficient Transparent Optimistic Rollback Recovery for Distributed Application Programs
Existing rollback-recovery methods using consistent checkpointing may cause high overhead for applications that frequently send output to the “outside world,” since a new consistent checkpoint must be written before the output can be committed, whereas existing methods using optimistic message logging may cause large delays in committing output, since processes may buffer received messages arbi...
متن کاملRollback Overhead Reduction Methods for Time Warp Distributed Simulation
Parallel discrete event simulation is a useful technique to improve performance of sequential discrete event simulation. We consider the Time Warp algorithm for asynchronous distributed discrete event simulation. Time Warp is an optimistic synchronization mechanism for asynchronous distributed systems that allows a system to violate the synchronisation constraint and, in this case, make the sys...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994